Skip to content

[WIP] Interval Types RFC#7078

Open
rajayush143 wants to merge 5 commits into
delta-io:masterfrom
rajayush143:interval_type_rfc
Open

[WIP] Interval Types RFC#7078
rajayush143 wants to merge 5 commits into
delta-io:masterfrom
rajayush143:interval_type_rfc

Conversation

@rajayush143

@rajayush143 rajayush143 commented Jun 23, 2026

Copy link
Copy Markdown

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

This change introduces an RFC proposal for interval types. It introduces native support for storing year-month and day-time interval types in Delta tables. It includes updates to the Delta protocol to add a new reader-writer table feature for interval types. It does not include adding support for statistics.

Design Decision documented in #7077.

How was this patch tested?

N/A

Does this PR introduce any user-facing changes?

No

@rajayush143 rajayush143 marked this pull request as ready for review June 25, 2026 00:01

# Interval Types Table Feature

This table feature (`intervalTypes`) adds the year-month and day-second interval types from ANSI SQL:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe link to a definition of interval types. https://docs.databricks.com/aws/en/sql/language-manual/data-types/interval-type would be one option, but maybe there's a more generic "sql reference". I didn't find one in a 20 second search but there should be one somewhere :)

Comment thread protocol_rfcs/interval-types.md Outdated

## Per-file Statistics

Interval columns do not support `minValues`/`maxValues` statistics or data skipping. Writers must not record `minValues` or `maxValues` for interval columns, and readers must not perform data skipping over interval columns. The per-column `nullCount` and the per-file `numRecords` statistics are unaffected and are still recorded as normal, since they do not require interpreting interval values. This is consistent with existing tables that contain interval types, which do not record `minValues`/`maxValues` for these columns.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Interval columns do not support `minValues`/`maxValues` statistics or data skipping. Writers must not record `minValues` or `maxValues` for interval columns, and readers must not perform data skipping over interval columns. The per-column `nullCount` and the per-file `numRecords` statistics are unaffected and are still recorded as normal, since they do not require interpreting interval values. This is consistent with existing tables that contain interval types, which do not record `minValues`/`maxValues` for these columns.
Interval columns do not support `minValues`/`maxValues` statistics or data skipping. Writers must not record `minValues` or `maxValues` for interval columns, and readers must not perform data skipping over interval columns. The per-column `nullCount` and the per-file `numRecords` statistics are unaffected and are still recorded as normal, since they do not require interpreting interval values.

I don't think we need to note this

Comment thread protocol_rfcs/interval-types.md Outdated
- `interval year to month` is stored as a Parquet `int32` holding the signed count of months.
- `interval day to second` is stored as a Parquet `int64` holding the signed count of microseconds.

Because no Parquet logical type is written, an interval column is physically indistinguishable from a Parquet `int32`/`int64` (i.e. a Delta `integer`/`long`); the interval semantics are carried solely by the Delta schema in `Metadata.schemaString`. This representation supports signed intervals and microsecond precision while matching the physical layout of existing interval tables.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Because no Parquet logical type is written, an interval column is physically indistinguishable from a Parquet `int32`/`int64` (i.e. a Delta `integer`/`long`); the interval semantics are carried solely by the Delta schema in `Metadata.schemaString`. This representation supports signed intervals and microsecond precision while matching the physical layout of existing interval tables.
Because no Parquet logical type is written, an interval column is physically indistinguishable from a Parquet `int32`/`int64` (i.e. a Delta `integer`/`long`); the interval semantics are carried solely by the Delta schema in `Metadata.schemaString`. This representation supports signed intervals and microsecond precision.

Comment thread protocol_rfcs/interval-types.md Outdated

## Error Conditions

- **Unrecognized type-name strings.** Type-name matching is case-sensitive. A reader that encounters an interval type-name string that is not one of the recognized canonical or narrowed spellings, including a mixed-family spelling such as `interval month to day`, or a case variant such as `INTERVAL Year To Month`, must reject the schema with an error rather than silently coercing it to a supported type.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- **Unrecognized type-name strings.** Type-name matching is case-sensitive. A reader that encounters an interval type-name string that is not one of the recognized canonical or narrowed spellings, including a mixed-family spelling such as `interval month to day`, or a case variant such as `INTERVAL Year To Month`, must reject the schema with an error rather than silently coercing it to a supported type.
- **Unrecognized type-name strings.** Type-name matching is case-sensitive. A reader that encounters an interval type-name string that is not one of the recognized canonical or narrowed spellings, including a mixed-family spelling such as `interval month to day` (not one of `year to month` or `day to second`), or a case variant such as `INTERVAL Year To Month`, must reject the schema with an error rather than silently coercing it to a supported type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants